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I .  INTRODUCTION 


j>This  grant  concerned  the  case  of  vector  processors  such  as 
the  CRAY-1  in  the  solution  of  sparse  systems  of  equations.  The 
study  produced  three  major  classifications  of  results! 

(1)  Algorithms  and  related  mathematical  software  for 
sparse  solution  on  single  processors  (uniprocessors). 

(2)  Preliminary  projection  of  vector  multiprocessor 

performance  on  linear  algebra  codes.  .L 

; 

(3)  Cooperative  work  on  vector  sparse  matrix  algorithms 
with  AFFDL  for  CFD  and  structures  codes,  and  with 
UC/Berkeley  on  for  circuit  simulation.  . 


II.  TECHNICAL  SUMMARY 


A.  Uniprocessor  Studies 

Figure  1  indicates  how  the  single  topic  of  general  sparse 
matrix  solution  using  scalar  processors  may  be  broken  into 
specialized  areas  of  study  when  implementation  on  vector 
architectures  is  considered. 

First,  highly  sparse  matrices,  usually  representing 
ODE/algebraic-modeled  systems,  are  easily  decoupled  by  re¬ 
ordering.  At  a  minimum,  locally-decoupled  equations  may  be 
solved  in  pipelined  scalar  mode;  if  the  decoupled  subsystems  can 
be  arranged  (a)  to  have  identical  sparsity,  and  (b)  to  be  stored 
a  constant  stride  apart,  then  a  simultaneous  sparse  solver  my  be 
invoked  and  a  vector  solution  obtained. 

As  sparse  systems  become  locally  coupled  -  as  occurs  in 
finite  element  and  finite  difference  problems  -  then  vectors  are 
easily  defined  within  the  coupled  subsystems.  It  is  worth 
making  a  further  distinction  between 

(a)  intra-nodal  or  intra-element  coupling,  where  the 
dimension  of  dense  submatrices  (and  hence  the  vector 
length)  is  proportional  to  the  number  of  unknowns/ node 
or  unknowns/ f ini te  element,  and 

(b)  inter-nodal  or  inter-element,  where  the  coupling 
between  grid  nodes  or  finite  elements  determines  the 
vector  length. 

Banded  and  profile  matrices  result  from  the  latter.  The 
associated  vector  lengths  are  the  products  of  the  number  of 
unknowns/ node  (element)  and  the  number  of  coupled  nodes.  These 
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lengths  ate  therefore  always  longer  than  in  the  former  case,  so 
that  common  bandsolvers  potentially  offer  the  highest  execution 
rate  (MFLOPS)  of  any  sparse  solver  applicable  to  finite  element 
problems . 

Algorithms  and  CRAY-1  mathematical  software  have  been 
developed  on  this  grant  for 

(a)  general  sparse  matrices,  solved  in  traditional  [15] 
and  reorganized  pipelined-scalar  mode  [7] [16]; 

(b)  patterned  sparse  matrices,  in  conjunction  with  a 
vectorized  electronic  circuit  analysis  program  [2]; 

(c)  blocked  matrices  arising  from  inter-nodal  coupling 
[12]? 

(d)  both  symmetric  and  unsymmetric  banded  and  blocked- 
profile  matrices  [13]  [14]; 

(e)  simultaneous  blocked  tridiagonal  systems,  as  arising 
in  CFD  [17] . 

Mathematical  software  resulting  from  these  studies  have 
been  collected  in  a  library  [17], 

B.  Multiprocessor  Studies 

In  the  1980's,  to  achieve  GIGAFLOP  performance  with  clock 
times  in  the  range  of  10  usee  will  require  multiple  vector 
processors  executing  cooperative  tasks.  The  ability  of 
extensions  of  the  present  CRAY  family  of  processors  to  execute 
small  tasks  in  an  efficient  manner  was  initially  studied  by 
developing  a  multi-processor  extension  of  the  CRAY-1  simulator 
[18].  Several  algorithms  were  studied  for  the  triangular 
factorization  of  small  matrices.  Figure  2  shows  representative 


efficiencies*  for  the  cooperative  LU  factorization  of  matrices, 
with  the  number  of  processors  (p)  ranging  from  2  to  16.  These, 
results  were  compared  with  2-processor  CRAY  XMP  timings,  using 
an  exprimental  in-house  multitasking  operating  system  at  Cray 
Research,  Inc. 

This  work  will  continue  under  a  new  AFOSR  grant. 

C.  Air  Force-related  Applications 

1.  Electronics .  A  Ph.D.  student  from  UC/Berkeley  engaged 
in  electronic  circuit  simulation  was  a  visitor  to  our  research 
group  during  the  summer  of  1980  to  study  the  CRAY-1  and  our 
sparse  matrix  research.  Under  subsequent  Bell  Laboratories  and 
AFOSR  auspices,  he  produced  a  series  of  papers  and  a  vectorized 
version  of  the  SPICE  electronic  circuits  analysis  program  which 
achieves  a  5-10:1  speedup  on  the  CRAY-1.  The  speedup  achieved 
from  simultaneous  sparse  solution  of  subcircuit  matrices- 
originally  proposed  in  [2]-  is  the  critical  feature  of  this 
program. 

2.  Aerodynamics .  Work  was  completed  in  1980  on  FDL- 
sponsored  research  on  vectorization  of  computational  fluid 
dynamics  (CFD)  codes.  A  four-day  seminar  on  the  general  topic 
of  vector  processing  was  presented  at  FDL  in  19.80. 

Air  Force  sponsored  research  on  vectorized  CFD  algorithms 
has  lead  to  related  work  currently  sponsored  by  NASA/Ames 
Research  Center. 

3.  Structures .  From  1981  -  1983,  two  related  finite 

♦Efficiency  ( n)*( uniprocessor  time)/(p*(multiprocessor  time)) 


element  analysis  and  optimization  codes  from  FDL  were 
vectorized,  under  joint  AFOSR-FDL  sponsorship.  By  far  the 
greatest  speedup  (>  2000:1)  was  due  to  vectorized  banded 
equation  solvers  developed  under  a  AFOSR  sponsorship  [14).  A 
report,  including  comparisons  with  NASTRAN,  has  been  written  on 
these  results  [20]. 

III.  OTHER  COUPLING  AND  PROFESSIONAL  ACTIVITIES 

A.  Seminars 

1.  Washington  State  University  (11/17/80) 

2.  University  of  California,  Berkeley  (11/19/80) 

3.  University  of  Texas,  Austin  (10/30/80) 

4.  4-day  seminar  at  AFFDL,  (6/80) 

5.  Seminar  at  LANL  (8/80) 

6.  Review  of  vector  processing  research,  AFFDL 
(5/26/81;  7/14/83) 

7.  Review  of  the  state-of-the-art  in  scientific 
computation  at  AFOSR  (5/6/82). 

B.  Visiting  Scientist  and  Counsulting 

1.  Visiting  scientist,  AFFDL,  to  give  instruction  on 
algorithms  for  vector  processing,  and  to  study  I/O 
problems  associated  with  Navier-Stokes  codes  on  the 
CYBER  203/205  (5/1/80  -  9/30/80).. 

2.  Visiting  scientist,  LANL,  on  vectorized  Monte  Carlo 
(5/1/80  -  9/30/81). 

3.  Visiting  scientist,  LANL,  on  performance  of  PIC 
codes  on  CRAY-2  (5/1/83  -  8/1/83). 

4.  Visiting  scientist,  LANL,  on  task  granularity  on 
vector  multiprocessors  (10/1/83  -  4/30/84). 


5.  Industrial  consultant,  Mobil  Research  and 
Development,  on  the  vectorizat ion  of  3-D  diffusion 
codes  associated  with  oil  reservoir  drilling  and 
management  (5/1/80  -  1/15/82). 

6.  Industrial  consultant.  Chevron  Oil  Field  Research 
Co.,  on  organization  of  vectorized  sparse  matrix 
algorithms  (2/82)  and  on  vector  multiprocessors 
(12/83). 

7.  Consultant,  LLNL,  o  vectorzed  Monte  Carlo  (5/1/82  - 
9/30/83) . 

8.  Visiting  scientist,  AFFDL,  on  vectorized  structural 
analysis  and  optimization  techniques  (5/1/82 
9/30/83) . 

9.  Consultant,  Cray  Research,  Inc.  to  profile 

projected  CRAY-2  performance  using  instruction- 
level  simulation  (1/83  -  ). 

C.  Related  Research 

1.  Principal  investigator,  NASA/ARC,  on  vectorization 

of  computational  chemistry  codes  (8/1/82  -  ). 

2.  Principal  investigator,  NASA/ARC,  on  preparation  of 

scientific  library  for  the  CRAY-2  (12/5/83  -  ). 

D.  Professional 

1.  A  one-week  short  course  on  Vector  Processing  was 
organized  and  presented  at  the  University  of 
Michigan  during  the  summers  of  1980,  1981,  and 
1982. 

2.  As  an  appointed  member  of  a  NASA  Technical  Review 
Board,  an  evaluation  was  made  of  proposals  from 
Control  Data  Corporation  and  the  Burroughs 


Corporation  for  the  S100  million  Numerical 
Aerodynamic  Simulator  (5/1/83  -  7/31/33). 

3.  Editor,  IEEE  Transaction  on  Computers,  8/1/82  - 

12/31/83. 
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